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ABSTRACT. We study the convergence rate to stationarity for a class of exchangeable partition- 
valued Markov chains called cut-and-paste chains. The law governing the transitions of a 
cut-and-paste chain are determined by products of i.i.d. stochastic matrices, which describe 
the chain induced on the simplex by taking asymptotic frequencies. Using this representa- 
tion, we establish upper bounds for the mixing times of ergodic cut-and-paste chains, and 
under certain conditions on the distribution of the governing random matrices we show 
that the "cutoff phenomenon" holds. 

1. Introduction 

A Markov chain {Xt}t=ox2,... on the space [/c] N of fc-colorings of the positive integers 
N is said to be exchangeable if its transition law is equivariant with respect to finite per- 
mutations of N (that is, permutations that fix all but finitely many elements of IN). Ex- 
changeability does not imply that the Markov chain has the Feller property (relative to the 
product topology on [fc] N ), but if a Markov chain is both exchangeable and Feller then it 
has a simple paintbox representation, as proved by Crane [3J. In particular, there exists a 
sequence {Sf} f >i of i.i.d. k x k random column-stochastic matrices (the paintbox sequence) 
such that conditional on the entire sequence {Sf} f >i and on Xo,Xi, . . . ,X„„ the coordinate 
random variables{X ) ' j!+l } (6 [, 1 ] are independent, and X*. has the multinomial distribution 
specified by the X' m column of S m+ i. Equivalently (see Proposition [33] in section |33) , con- 
ditional on the paintbox sequence, the coordinate sequences {X' m+l } m >o are independent, 
time-inhomogeneous Markov chains on the state space [k] with one-step transition proba- 
bility matrices Si, S2, This implies that for any integer n > 1 the restriction x| n ^ of X f 

to the space [ky- n * is itself a Markov chain. We shall refer to such Markov chains X f and 
xj"^ as exchangeable Feller cut-and-paste chains, or EFCP chains for short. Under mild hy- 
potheses on the paintbox distribution (see the discussion in section [5} the restrictions of 

EFCP chains X; to the finite configuration spaces [k]^ are ergodic. The main results of 
this paper, theorems I1.1H1.21 relate the convergence rates of these chains to properties of 
the paintbox process Si, S2, 

Theorem 1.1. Assume that for some m > 1 there is positive probability that all entries of the matrix 
product S m S m -i ■ ■ ■ Si are nonzero. Then the EFCP chain X^ is ergodic, and it mixes in O(logn) 
steps. 
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Theorem 1.2. Assume that the distribution of Si is absolutely continuous relative to Lebesgue 
measure on the space ofkxk column-stochastic matrices, with density of class LP for some p > 1. 
Then the associated EFCP chains X' n > exhibit the cutoff phenomenon: there exists a positive 
constant 6 such that for all sufficiently small 5, £ > the (total variation) mixing times satisfy 

(1) (0 - S)logn < 4 n ; x (6) < 4 n ; x (l - e) <(6 + S)logn 
for all sufficiently large n. 

Formal statements of these theorems will be given in due course (see Theorems I5.4l and 
15.71 in section |5j, and less stringent hypotheses for the 0(log n) convergence rate will be 
given. In the special case k — 2 the results are related to some classical results for random 
walks on the hypercube, e.g. the Ehrenfest chain on {0, 1}": see example 15.91 

The key to both results is that the relative frequencies of the different colors are deter- 
mined by the random matrix products StS t -\ ■•■S\ (see Proposition [33]). The hypotheses 
of Theorem l 1 . 1 1 ensure that these matrix products contract the A:-simplex to a point at least 
exponentially rapidly. The stronger hypotheses of Theorem II .21 prevent the simplex from 
collapsing at a faster than exponential rate. 

The paper is organized as follows. In section [2] we record some simple and elementary 
facts about total variation distance, and in section|3]we define cut-and-paste Markov chains 
formally and establish the basic relation with the paintbox sequence (Proposition [33]). In 
section|4]we discuss the contractivity properties of products of random stochastic matrices. 
In section [5] we prove the main results concerning ergodicity and mixing rates of cut-and- 
paste chains, and in section 15.31 we discuss some examples of cut-and-paste chains not 
covered by our main theorems. Finally, in section [6] we deduce mixing rate and cutoff for 
projections of the cut-and-paste chain into the space of ordinary set partitions. 

2. PRELIMINARIES: TOTAL VARIATION DISTANCE 

Since the state spaces of interest in our main results are finite, it is natural to use the 
total variation metric to measure the distance between the law D(X m ) of the chain X at 
time m > 1 and its stationary distribution n. The total variation distance \\p — v\\tv between 
two probability measures p, v on a finite or countable set X is defined by 

(2) \\p - v\\ TV = \Y \p(x) - v(x)\ = max(v(B) - p(B)). 

xeX 

The maximum is attained at B* = {x : v(x) > p(x)} and, since the indicator lg> is a function 
only of the likelihood ratio dv/dp, the total variation distance \\p - v\\tv is the same as the 
total variation distance between the p— and v- distributions of any sufficient statistic. In 
particular, if Y = Y(x) is a random variable such that dv/dp is a function of Y, then 

o) \\p - v\\tv = \Yj |v(y =y ) - ^ i(y = y )l 

y 

where the sum is over all possible values of Y(x). 

Likelihood ratios provide a useful means for showing that two probability measures are 
close in total variation distance. 

Lemma 2.1. Fix e > 0, and define 



x) 
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Ifv(B £ ) < e, then ||/i - v|| T y < 2e. 



Proof. By definition of B £ , B 
for every x e B c e and 



c ._ 



x : \u(x)-v(x)\ < ev(x)\ and so (l-e)v(x) < n(x) < (l + e)v(x) 
M (B c £ )>(l- £ )v(B c e ). 



By assumption v(B E ) < e, it follows that n(B^) > (1 - e) 2 and 
II/J-v||tv = 



1 

< - 
~ 2 



^ |/i(ac) - v(x)| + H x ) ~ v ( x )\ 

xeB t xeB c e 



xeB e 



xeB c _ 



< 2e. 



□ 



The convergence rates of EFCP chains will be (in the ergodic cases) determined by the 
contractivity properties of products of random stochastic k x k matrices on the (k — 1)- 
dimensional simplex 



(4) 



A fc := I (si, . . . , s fc ) T : s,- > and 2L s t = lj- 



We now record some preliminary lemmas about convergence of probability measures on 
A/ c that we will need later. For each n £ N and each element s e define a probability 
measure q™, the product multinomials measure on [k] n by 



(5) 



da*) ■■= 



for x = x L x 1 ---x n € [kf. 



Observe that the vector m(x) := (mi, . . . ,m^) of cell counts defined by mj := YJJ^i l/(* 1 ) is 
sufficient for the likelihood ratio g^(x)/g n ,(x) of any two product-multinomial measures g" 
and g" . 

Corollary 2.2. Fix 5, e > 0. Ifs n ,s' n are two sequences in Afc swcfo ifoaf all coordinates ofs n ,s' n are 
in the interval [5, 1 - 5] for every n, and if\\s n - s' n \\ m < n" 1 ^ 2 "', then 

lim||0* -$|| T y = 0. 

Proof. This is a routine consequence of Lemma l2Tl as the hypotheses ensure that the likeli- 
hood ratio dg" /dg", is uniformly close to 1 with probability approaching 1 as n — > co. □ 

A similar argument can be used to establish the following generalization, which is 
needed in the case of partitions with k > 3 classes. For Si, . . . , e A^, let ■ •<£> denote 
the product measure on [k]" 1+ '" +nk where the first n\ coordinates are i.i.d. multinomial-si, 
the next ni are i.i.d. multinomial^, and so on. 

Corollary 2.3. Fix 5,e > 0. For each i e [k] let {s l n } n >i and [t l n } n >i be sequences in A^ all of 
whose entries are in the interval [5, 1 - 6], and let K' n be sequences of nonnegative integers such that 
ZiK = n. IfZU \K - 4IU < n" 1/2 - f , then 
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lim 

n— >oo 



97 



K 
97 



97 



K k „ 

97 



0. 



TV 



In dealing with probability measures that are defined as mixtures, the following simple 
tool for bounding total variation distance is useful. 

Lemma 2.4. Let \i, v be probability measures on a finite or countable space X that are both mixtures 
with respect to a common mixing probability measure \{d0), that is, such that there are probability 
measures \Iq and vefor which 



J p e dA(0) and v = J v e dA(6). 



IfW^e ~ v b\\tv < £far all 6 in a set of A-probability > 1 - e then 

\\p-v\\ TV < 2e. 

Lower bounds on total variation distance between two probabilities \i, v are often easier 
to establish than upper bounds, because for this one only need find a particular set B such 
that p(B) - v(B) is large. By it suffices to look at sets of the form B — {Y e F), where Y is 
a sufficient statistic. The following lemma for product Bernoulli measures illustrates this 
strategy. For a € [0, 1], we write v n a : g", where s := (a, 1 - a) e A2, to denote the product 
Bernoulli measure determined by a. 

Lemma 2.5. Fix e > 0. Ifa m ,p m are sequences in [0, 1] such that \a m - fi m \ > m~ 1 l 1+E , then 

lim \\vZ -< ||tv = 1. 

m->oo a "< P>n 

Proof. Without loss of generality assume that a m < fi m , and let y m = (a m + jS m )/2. Denote 
by S m the sum of the coordinate variables. Then by Chebyshev's inequality, 

lim < {S m < my m } = 1 and 

m— >oo m 

lim v!f \S m < my m \ - 0. 

m->oo P»> ' 

□ 

Remark 2.6. Similar results holds for multinomial and product-multinomial sampling. (A) 
If s n , s' n € Ajt are distinct sequences of probability distributions on [k] such that for some 
coordinate i £ [k] the z'th entries of s„ and s' n differ by at least n _1 / 2+e , then 

lim \\Q n -q",\\tv = 1. 



(B) If s' n , t' n e Ajt are distinct sequences of probability distributions on [k] such that for some 
pair i, ] e [k] the ;th entries of s l n and t' n differ by at least n~ l l 1+E , then for any sequences K' n 
of nonnegative integers such that L ; ^ = n, 



lim lim 

n — >oo n — >oo 



97 



' 97 



97 

1 11 



6 7 



= 1. 



TV 



These statements follow directly from Lemma 1231 by projection on the appropriate coordi- 
nate variable. 
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3. Preliminaries: CP chains and Paintbox Representation 

3.1. Labeled and unlabeled partitions. For k,n e N = {1,2, . . .}, a labeled fc-ary partition 
L of [n] is an ordered collection L := (L\, . . .,Li) of disjoint subsets for which U; = i U = M- 
An unlabeled ft:-ary partition of [n] is an unordered collection L := {L\, ... , L r \, where r < k, of 
nonempty, disjoint subsets whose union is [n]. The set £[n]:k of labeled /c-ary partitions of 
[n] can be naturally identified with the set [k] n of /c-colorings of the set [n], via the map 

L i-> Z 1 / 2 • • where P = /<=>?' e Ly. 

Thus, the multinomial-s measure defined in the previous section induces a measure 
on -C[ n ] : k, which we will also denote by g". There is an obvious and natural projection 
n„ : X,[ n ]:k ( P[n\± from the set £,\ n ]± of labeled partitions to the set < P[ n -\± of unlabeled 
partitions given by 

(6) U n (L):={L 1 ,...,L k }\m- 
This mapping coincides with the natural projection 

U n : [kf -> [kff ~ 

where ~ is the equivalence relation I 1 1 2 ■ ■ ■ l n ~ l]l 2 ■ ■ ■ I" if and only if there exists a permu- 
tation a of [k] such that l\ = o(l') for each i e [n]. Some of the Markov chains on £[ n ]± 
considered below have transition laws invariant under such permutations o of the labels 
[k], and in such cases the Markov chain projects via Yl n to a Markov chain on the state 
space P[ n ]:k- This is discussed further in section|6]below. 

3.2. Matrix operations on X[oo] : fc- The cut-and-paste Markov chain on £[ n \:k can be de- 
scribed by a product of i.i.d. random set-valued matrices with a special structural property. 

Definition 3.1. For any subset S c N, define a A:-ary (or k x k) partition matrix over S 
to be a k x k matrix M whose entries Mu are subsets of S such that every column M ! is 
a labeled fc-ary partition of S. For any two A:-ary partition matrices M,M', define the 
product M*M' = MM' by 

(7) (M * M')ij = (MM')ij := |J (M ;7 n My), for all 1 < i, ;' < k. 

l<l<k 

We write Af[ n ] : fc to denote the space of fc x /c partition matrices of [n]. Observe that the 
matrix product defined by ([7} makes sense for matrices with entries in any distributive 
lattice, provided U, n are replaced by the lattice operations. 

As each column of any M € M[n]-.k is a ^-ary partition of [n], the set M{ n -\± of /c-ary 
partition matrices over [n] can be identified with JJL^ k - Furthermore, a fc-ary partition 
matrix M induces a mapping M : £[ n ]± — > X[ n ] : fc/ by 

(ML)i = (jMyL ; -. 

Lemma 3.2. Let fc, n e N. T/zen 

(i) /or eac/j L € £[ n ]±, ML e £ [n]:k for all M e M[ n ]&; 

(ii) for any L, L' e £.[ n ]±, there exists M € M[ n ]± such that ML = L' ; 

(iii) the pair (Ai[ n ]±, *) is a monoid (i.e., semigroup with identity) for every n e N. 
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The proof is elementary and follows mostly from the definition |[7} (the semigroup iden- 
tity is the matrix whose diagonal entries are all [n] and whose off -diagonal entries are 0). 
We now describe the role of the semigroup (M[ n ]±, *) in describing the transitions of the 
cut-and-paste Markov chain. 

3.3. Cut-and-paste Markov chains. Fix n,k e N, let /.( be a probability measure on M\n\±, 
and let qq be a probability measure on -£[„]*■ The cut-and-paste Markov chain X = (X m ) m >o 
on £.[n\± with initial distribution Qq and directing measure j.i is constructed as follows. Let 
Xo ~ go and, independently of Xo, let M\,M2, ... be i.i.d. according to /.i. Define 

(8) X,„ = M m X m _! = M m M m _i • • • MiX , for m > 1. 

We call any Markov chain with the above dynamics a CP n (/i; go) chain, or simply a CP„(/,i) 
chain if the initial distribution is unspecified. Henceforth we will use the notation X' m to 
denote the zth coordinate variable in X m (that is, X' m is the color of the site i £ [n] when X m 
is viewed as an element of [k] [n] ). 

Our main results concern the class of cut-and-paste chains whose directing measures 
fj. - /j£ are mixtures of product multinomial measures ^s, where S ranges over the set 
of k x k column-stochastic matrices. For any S e A^, the product multinomial measure jis 
is defined by 

k n 

(9) /z s (M) := J ] S(M>'(t), /') for M e M M± , 

where M ] (i) = J^ r rl{i : e M r j} denotes the index r of the row such that / is an element of M r j. 
(In other words, the columns of M ~ are independent labeled fc-ary partitions, and in 
each column Mi the elements i € [n] are independently assigned to rows r € [k] according 
to draws from the multinomial distribution determined by the /'th column of S.) For any 
Borel probability measure E on A*, we write to denote the E-mixture of the measures 
lis on M[ n ]±, that is, 



(10) ^0:= lisiWS). 

Crane |3| has shown that every exchangeable, Feller Markov chain on the the space 
of fc-colorings of the positive integers is a cut-and-paste chain with directing measure of 
the form ((T0|) , and so henceforth, we shall refer to such chains as exchangeable Feller cut-and- 
paste chains, or EFCP chains for short. 

An EFCP chain on [k]^ (or with directing measure \i = piz can be constructed 
in two steps, as follows. First, choose i.i.d. stochastic matrices Si, S2, ■ ■ ■ with law E, all 
independent of Xo; second, given Xo, Si, S2, . . ., let Mi, M2, ... be conditionally independent 
k-ary partition matrices with laws M; ~ fis, for each i — 1,2,.. ., and define the cut-and- 
paste chain X m by equation |(8]). This construction is fundamental to our arguments, and so 
henceforth, when considering an EFCP chain with directing measure jJz, we shall assume 
that it is defined on a probability space together with a paintbox sequence Si, S2, 

For each meN, set 



(11) 



Qm ■— S m S m -\ ■ ■ ■ S\. 
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Note that Q m is itself a stochastic matrix. Denote by S the a-algebra generated by the 
paintbox sequence Si, S2, 

Proposition 3.3. Given Q :- o(Xq) V S, the n coordinate sequences (X l m ) m >o, where i e [n], are 
conditionally independent versions of a time-inhomogeneous Markov chain on [k] with one-step 
transition probability matrices Si, S2, Thus, in particular, for each m>\, 

n 

(12) P(X< m = x' m for each i e [n] 1 0) = ]J Q„(4, Xj,). 

z'=l 

Proof. We prove that the Markov property holds by induction on m. The case m = 1 fol- 
lows directly by I©, as this implies that, conditional on Q, the coordinate random vari- 
ables are independent, with multinomial marginal conditional distributions given by 
the columns of Si. Assume, then, that the assertion is true for some m > 1. Let T m be the 
a-algebra generated by Q and the random matrices Mi, M2, . . . , M m . Since the specification 
I!]) expresses X m as a function of Xo, Mi, M2, . . . , M m , the random variables X|, where t <m, 
are measurable with respect to T m - Moreover, given Q the random matrix M m +\ is condi- 
tionally independent of T m , with conditional distribution I© where S = S„,+i. Equation I© 
implies that, conditional on Q the columns M c + , of M m+ \ are independent fc-ary partitions 
obtained by independent multinomial- S c sampling. Consequently, 

p ( x m + i = 4 + i V f 6 W I = P((M m+1 X w ) i = V i € [n] I r m ) 

= P((M m+ iX m ) ! = V i e [n] I Q V a(X M )) 

n 

= || S m +i(^J„ +1 ,Xj )! ), 

j=l 

the second equality by the induction hypothesis and the third by definition of the proba- 
bility measure ps m+1 ■ This proves the first assertion of the proposition. The equation ((12)) 
follows directly. □ 

Proposition 13.31 shows that for any n > 1 a version of the EFCP on [kY n ' can be con- 
structed by first generating a paintbox sequence S m and then, conditional on S, running 
independent, time-inhomogeneous Markov chains X' m with one-step transition probability 
matrices S m . From this construction it is evident that a version of the EFCP on the infinite 
state space [/c] N can be constructed by running countably many conditionally independent 
Markov chains X' m , and that for any n e N the projection of this chain to the first n coordi- 
nates is a version of the EFCP on \kY . 

4. Random stochastic matrix products 

For any EFCP chain {X m } m >o, Proposition l3.3l directly relates the conditional distribution 
of X m to the product Q m - S m S m _i • • • Si of i.i.d. random stochastic matrices. Thus, the 
rates of convergence of these chains are at least implicitly determined by the contractivity 
properties of the random matrix products Q m . The asymptotic behavior of i.i.d. random 
matrix products has been thoroughly investigated, beginning with the seminal paper of 
Furstenberg and Kesten |4): see UJ and (5) for extensive reviews. However, the random 
matrices S, that occur in the paintbox representation of the CP„(uz) chain are not necessar- 
ily invertible, so much of the theory developed in (U and doesn't apply. On the other 
hand, the random matrices Sf are column-stochastic, and so the deeper results of (H and 
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are not needed here. In this section we collect the results concerning the contraction rates 
of the products Q m needed for the study of the EFCP chains, and give elementary proofs 
of these results. 

Throughout this section assume that {S,},->i is a sequence of independent, identically 
distributed k X k random column-stochastic matrices, with common distribution E, and let 

Qm S m S m -\ ■ ■ ■ S\. 

4.1. Asymptotic Collapse of the Simplex. In the theory of random matrix products, a 
central role is played by the induced action on projective space. In the theory of products 
of random stochastic matrices an analogous role is played by the action of the matrices on 
the simplex A*-. By definition, the simplex consists of all convex combinations of the 
unit vectors e\,e?_, . . . , e/ c of R k ; since each column of a k x k column-stochastic matrix S e A k 
lies in A^, the mapping v i-> Sv preserves A*-. This mapping is contractive in the sense that 
it is Lipschitz (relative to the usual Euclidean metric on Mr) with Lipschitz constant < 1. 

The simplex Afc is contained in a translate of the (k — l)-dimensional vector subspace V = 
Vjt of M. k consisting of all vectors orthogonal to the vector 1 = (1,1,..., 1) T (equivalently, 
the subspace with basis e; - e,+i where 1 < i < k — 1). Any stochastic matrix A leaves 
the subspace V invariant, and hence induces a linear transformation A\V : V — > V. Since 
this transformation is contractive, its singular values are all between and 1. (Recall that the 
singular values of a d xd matrix S are the square roots of the eigenvalues of the nonnegative 
definite matrix S T S. Equivalently, they are the lengths of the principal axes of the ellipsoid 
S(S d_1 ), where S d_1 is the unit sphere in IR^.) Denote the singular values of the restriction 
Qn\V by 

(13) 1 > A n ,i > A„,2 > > A„, k -i > 0. 

Because the induced mapping Q n : — > A/ c is affine, its Lipschitz constant is just the 
largest singular value A fI/ i . 

Proposition 4.1. Let (Si)i>\ be independent, identically distributed kxk column-stochastic random 
matrices, and let Q m - S m S m _i ■ ■ ■ S\. Then 

(14) lim diameter(Q m (A^)) = 0. 

m— >oo 

if and only if there exists m > 1 such that with positive probability the largest singular value A W/ i 
ofQ m \V is strictly less than 1. In this case, 

(15) limsupdiameter(Q m (Aj i )) 1 ^" < 1 almost surely. 

Proof. In order that the asymptotic collapse property ((14)) holds it is necessary that for some 
m the largest singular value of Q m \V be less than one. (If not then for each m there would 
exist points u m ,v m € Ajt such that the length of Q m (u m - v m ) is at least the length of u m - 
v m ; but this would contradict (O.) Conversely, if for some e > the largest singular of 
Q m |V is less than 1 — e with positive probability then with probability 1 infinitely many 
of the matrix products S mn+m S mn+m -i ■ ■ ■ S mn+ \ have largest singular value less than 1 - e. 
Hence, the Lipschitz constant of the mapping on Aj induced by Q mn must converge to 
as n — > co. In fact even more is true: the asymptotic fraction as n — > co of blocks where 
Smn+mSmn+m-i " " " Smn+i has largest singular value < 1 - £ is positive, by strong law of large 
numbers, and so the Lipschitz constant of Q mn : Aj- — » Aj. decays exponentially. □ 
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Hypothesis 4.2. For some integer m > 1 the event that all entries of Q m are positive has positive 
probability. 

Corollary 4.3. Hypothesis \4.2\ implies the asymptotic collapse property ((14)) . 

Proof. It is well known that if a stochastic matrix has all entries strictly positive then its 
only eigenvalue of modulus 1 is 1, and this eigenvalue is simple (see, for instance, the 
discussion of the Perron-Frobenius theorem in the appendix of 0). Consequently, if Q m 
has all entries positive then A m ,i < 1. □ 

4.2. The induced Markov chain on the simplex. The sequence of random matrix prod- 
ucts {Qm)m>\ induce a Markov chain on the simplex Ajt in the obvious way: for any initial 
vector Yq e independent of the sequence (S m ) m >o, put 



That the sequence { Y m } m >o is a Markov chain follows from the assumption that the matrices 
Si are i.i.d. Since matrix multiplication is continuous, the induced Markov chain is Feller 
(relative to the usual topology on Aj.). Consequently, since A*, is compact, the induced chain 
has a stationary distribution, by the usual Bogoliubov-Krylov argument (see, e.g., fT0| ). 

Proposition 4.4. The stationary distribution of the induced Markov chain on the simplex is unique 
if and only if the asymptotic collapse property ((14|) holds. 

Proof of sufficiency. Let n be a stationary distribution, and let Yo ~ n and Y$ be random 
elements of A; c that are independent of the sequence {Q m } m >i- Define Y m = Q m Yo and 
Y m = QuiYq. Both sequences {Y m } m >o and {Y m } m >o are versions of the induced chain, and 
since the distribution of Yo is stationary, Y m ~ n for every m > 0. But the asymptotic 
collapse property ((14)) implies that as m — > oo, 



The converse is somewhat more subtle. Recall that the linear subspace V = V\ orthogo- 
nal to the vector 1 is invariant under multiplication by any stochastic matrix. Define U c V 
to be the set of unit vectors u in V such that ||Q m w|| = ||w|| almost surely for every m > 1. 
Clearly, the set U is a closed subset of the unit sphere in V, and it is also invariant, that is, 
Q m (U) c U almost surely. 

Lemma 4.5. The set U is empty if and only if the asymptotic collapse property ((14)) holds. 

Proof. If ((14)) holds then lim^oo A OT/ i = 0, and so ||Q m i/|| — > a.s. for every unit vector u e V. 
Thus, U = 0. 

To prove the converse statement, assume that the asymptotic collapse property QD) fails. 
Then by Proposition 14.11 for each m > 1 the largest singular value of Q m \V is A„ 7/ i = 1, 
and consequently there exist (possibly random) unit vectors v m € V such that ||Q m y m || = 1. 
Since each matrix S ; is contractive, it follows that ||Q m z? m+f! || = 1 for all m, n > 1. Hence, 
by the compactness of the unit sphere and the continuity of the maps Q m \V, there exists a 
possibly random unit vector u such that ||Q m w|| = 1 for every m > 1. 

We will now show that there exists a non-random unit vector u such that HQmUll = 1 for 
every m, almost surely. Suppose to the contrary that there were no such u. For each unit 
vector u, let p m (u) be the probability that ||Q m M|| < 1. Since the matrices S m are weakly 
contractive, for any unit vector u the events ||Q m w|| = 1 are decreasing in m, and so p m {u) is 



(16) 



Ym — QmYo. 



d(Y m ,Y m )^0, 
so the distribution of Y m approaches n weakly as m — > oo. 



□ 



10 



HARRY CRANE AND STEVEN R LALLEY 



non-decreasing. Hence, by a subsequence argument, if for every m > 1 there were a unit 
vector u m such that p m (u m ) - 0, then there would be a unit vector u such that p m {u) - 
for every m. But by assumption there is no such u; consequently, there must be some finite 
m > 1 such that p m (u) > for every unit vector. 

For each fixed m, the function p m (u) is lower semi-continuous (by the continuity of ma- 
trix multiplication), and therefore attains a minimum on the unit sphere of V. Since p m is 
strictly positive, it follows that there exists 5 > such that p m (u) > 5 for every unit vector 
u. But if this is the case then there can be no random unit vector u such that ||Q m w|| = 1 for 
every m > 1, because for each m the event that ||Q m+ iu|| < ||Q m u|| would have conditional 
probability (given Si, S2, . . . , S m ) at least 5. □ 

Proof of necessity in Proposition \4.4\ If the asymptotic collapse property (O fails, then by 
Lemma l4~5l there exists a unit vector u e V such that ||Q m u|| = 1 for all m > 1, almost surely. 
Hence, since A^ is contained in a translate of V, there exist distinct p., v e A/ c such that 
\\Qm(p - v)\\ = \\p - v\\ for all m > 1, a.s. By compactness, there exists such a pair (p, v) e 
for which \\p — v\\ is maximal. Fix such a pair (p,v), and let A c be the set of all pairs 
(y, z) such that 

l^y-Sizll = \\p-v\\ a.s. 

Note that the set A is closed, and consequently compact. Furthermore, because p, v have 
been chosen so that ||/.( — v|| is maximal, for any pair (y,z) € A the points y and z must both 
lie in the boundary d of the simplex. 

Define Y m - Q m p, Z m - Q m v, and R m - (Y,„ + Z,„)/2. By construction, for each m > 
the pair (Y m , Z m ) lies in the set A. The sequence (Y m , Z m , R m ) is a A^-valued Markov chain, 
each of whose projections on A^ is a version of the induced chain. Since A^ is compact, the 
Bogoliubov-Krylov argument implies that the Markov chain (Y m ,Z m ,R m ) has a stationary 
distribution A whose projection Ay,z on the first two coordinates is supported by A. Each of 
the marginal distributions Ay, Az, and Ar is obviously stationary for the induced chain on 
the simplex, and both Ay and Az have supports contained in d A\. Clearly, if (Y, Z, R) ~ A 
then R = (Y + Z)/2. 

We may assume that Ay = Az, for otherwise there is nothing to prove. We claim that 
Ar ^ Ay. To see this, let D be the minimal integer such that Ay is supported by the union 
<9d Aj. of the D-dimensional faces of A^. If (Y, Z, R) ~ A, then Y t Z, since Ayz has support 
in A. Consequently, (Y + Z)/2 is contained in the interior of a (D + l)-dimensional face of 
Afc. It follows that Ar ^ Ay. 

□ 

Remark 4.6. Recurrence Times. Assume that the asymptotic collapse property (O holds, 
and let v be the unique stationary distribution for the induced chain on the simplex. Say 
that a point v of the simplex is a support point of v if v gives positive probability to every 
open neighborhood of v. Fix such a neighborhood U, and let t be the first time m > 1 that 
Y m e- IT. Then there exists < r = ru < 1 such that for all m > 1, 

P{t > ra} < r m , 

regardless of the initial state Yo of the induced chain. To see this, observe that because 
v(!i) > there exists m such that the event Q m (Ak) c U has positive probability. Con- 
sequently, because the matrices S, are i.i.d., the probability that Q mn (Ak) <£ U for all n - 
1, 2, . . . , N is exponentially decaying in N. 
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Remark 4.7. Relation between the induced chain on Aj. and the EFCP. Let {X OT } m >o be a version 
of the EFCP on [fc] N with paintbox sequence {S m } m >i. By Proposition 13.31 the individual 
coordinate sequences {X' m } m >o are conditionally independent given Q = o(Xq, S\, Si, . . . ), 
and for each i the sequence {X' m } m >o evolves as a time-inhomogeneous Markov chain with 
one-step transition probability matrices S m . Consequently, by the strong law of large num- 
bers, if the initial state Xo has the property that the limiting frequencies of all colors r e [k] 
exist with probability one (as would be the case if the initial distribution is exchangeable), 
then this property persists for all times m > 1. In this case, the sequence { Y m }m>0/ where Y m 
is the vector of limiting color frequencies in the mth generation, is a version of the induced 
Markov chain on the simplex A^. Moreover, the y'th column of the stochastic matrix S m 
coincides with the limit frequencies of colors in X m among those indices i € N such that 
X 1 . = Thus, the paintbox sequence can be recovered (as a measurable function) from 
the EFCP. 

4.3. Asymptotic Decay Rates. Lebesgue measure on A/ c is obtained by translating Lebesgue 
measure on V (the choice of Lebesgue measure depends on the choice of basis for V, but 
for any two choices the corresponding Lebesgue measures differ only by a scalar multiple). 
The /c-fold product of Lebesgue measure on A^ will be referred to as Lebesgue measure on 

4 

Hypothesis 4.8. The distribution E of the random stochastic matrix Si is absolutely continuous 
with respect to Lebesgue measure on and has a density of class W for some p > 1. 

Hypothesis l4.8l implies that the conditional distribution of the z'th column of Si, given the 
other k-1 columns, is absolutely continuous relative to Lebesgue measure on A^. Conse- 
quently, the conditional probability that it is a linear combination of the other A: - 1 columns 
is 0. Therefore, the matrices St are almost surely nonsingular, and so the Furstenberg the- 
ory (HI, chapters 3-4) applies. Furthermore, under Hypothesis 14.81 the entries of Si are 
positive, with probability 1. Thus, Hypothesis 14.81 implies Hypothesis 14.21 

Proposition 4.9. Under Hypothesis 14.81 

(17) E| log | det Si|| < co, 

and consequently 

(18) lim(det(Q„|y)) 1/ " = e K where K = ElogdetSi. 

JJ — s-oo 

Note 4.10. The determinant of Si is the volume of the polyhedron Si[0, l] k , which is Vfc 
times the volume of the (k - l)-dimensional polyhedron with vertices Sie ; -, where 1 < i < k. 
The volume of this (k — l)-dimensional polyhedron is the determinant of the restriction 
Si|V. Consequently, 

k-1 

detS 1 \V=Y[*V- 

i=i 

Proof. The assertion ((18]) follows from ((17)) , by the strong law of large numbers, since the 
determinant is multiplicative. It remains to prove (Tl7)) . Fix e > 0, and consider the event 
det Si < £. This event can occur only if the smallest singular value of Si is less than e 1 ' , 
and this can happen only if one of the vectors Sie, lies within distance e 1 ^ (or so) of a 
convex linear combination of the remaining Sie,-. 
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The vectors S\e[, where i e [k], are the columns of Si, whose distribution is assumed to 
have a U density f(M) with respect to Lebesgue measure dM on S^. Fix an integer m > 1, 
and consider the subset B in of consisting of all kxk stochastic matrices M such that the zth 
column Me, lies within distance e~ m of the set of all convex combinations of the remaining 
columns Mei. Elementary geometry shows that the set B m has Lebesgue measure < Ce~ m , 
for some constant C = depending on the dimension but not on m or i. Consequently, by 
the Holder inequality, for a suitable constant C - C{ < oo, 



oo „ 

E| log I det Sill < C YjW + l) I f(M)dM 

m=0 ^ B "' 

<C'2^(m + l)|J ldMj | J f(MfdM 
< C 2_j™ + l)e" m/l? j f(Mf dM\ < oo 

m=0 ^ ' 

where l/p+l/q = 1. In fact, this also shows that log | det Si | has finite moments of all orders, 
and even a finite moment generating function in a neighborhood of 0. □ 

Proposition 4.11. Under hypotheses [ 



(19) lim A V " := A x exists a.s. 

n— >oo n >^ 

Moreover, the limit Ai is constant and satisfies < Ai < 1. 

Remark 4.12. It can be shown that the Lyapunov exponents of the sequence Q m are the 
same as those of Q m \V, but with one additional Lyapunov exponent 0. Thus, logAi is the 
second Lyapunov exponent of the sequence Q m . 

Remark 4.13. Hypothesis 14.81 implies that the distribution of Si is strongly irreducible (cf. 
(Til , ch. 3), and so a theorem of Furstenberg implies that the top two Lyapunov exponents 
of the sequence Q m are distinct. However, additional hypotheses are needed to guarantee 
that Ai > 0. This is the main point of Propositions 14.9444.1 1 1 

Proof of Proposition \4.11\ The almost sure convergence follows from the Furstenberg-Kesten 
theorem |4| (or alternatively, Kingman's subadditive ergodic theorem [8]), because the 
largest singular value of Q n \V is the matrix norm of Q f j|V, and the matrix norm is sub- 
multiplicative. That the limit Ai is constant follows from the Kolmogorov 0-1 law, because 
if the matrices Sy are nonsingular (as they are under the hypotheses on the distribution of 
Si) the value of Ai will not depend on any initial segment S m S m _i • • • Si of the matrix prod- 
ucts. 

That Ai < 1 follows from assertion ((15)) of Proposition 14. 1[ because Hypothesis 14.21 im- 
plies that there is a positive probability j] > that all entries of Si are at least e > 0, in 
which case Si is strictly contractive on - and hence also on V - with contraction factor 
6 = 9(e) < 1 (El, proposition 1.3). 

Finally, the assertion that Ai > follows from Proposition l4.9l because for any stochastic 
matrix each singular value is bounded below by the determinant. □ 

Corollary 4.14. Under hypotheses 14.81 

lim max \\Q n ei - QnCjW 1 ^ = M almost surely. 

n->oo i±j 1 



CONVERGENCE RATES 



13 



Proof. The lim sup of the maximum cannot be greater than Ax, because for each n the singu- 
lar value A n/ i of Q n \V is just the matrix norm ||Q„||. To prove the reverse inequality, assume 
the contrary. Then there is a subsequence n - n m — > oo along which 

limsupmax||Q f! ei - Q n ej\\ 1/n < Ai - e 

m— >oo 

for some £ > 0. Denote by u - u n £ V the unit vector that maximizes HQn^ll- Because 
the vectors e, - e, + i form a basis of V, for each n the vector u„ is a linear combination 
u n = Yji a ni{^i ~ e i+i), and because each u n is a unit vector, the coefficients a n j are uniformly 
bounded by (say) C in magnitude. Consequently, 

\\QnU„\\<cY J WQn(ei-e i+1 )\\. 

i 

This implies that along the subsequence n - n m we have 

limsupHQnUj 1 /" < Ai - e. 

m— >oo 

But this contradicts the fact that ||Q n | V\\ 1/n ^ Ai from propositionET] 

□ 

Remark 4.15. It can be also be shown that 

lim minllQ^e,- - Q„ej\\ 1/n = A x . 

This, however, will not be needed for the results of section |5l 

Remark 4.16. The argument used to prove that X\ < 1 in the proof of Proposition ^. 1 1 1 also 
proves that even if Hypothesis I4.8l fails. if the distribution of Si puts positive weight on the 
set of stochastic matrices with all entries at least e, for some e > 0, then 

(20) lim sup max ||Q„e f - Q„ej\\ 1/n < 1. 

n—>oo 

Hypothesis 14.81 guarantees that the sequence ||Q n e; - Qn^jW 1 ^ has a limit, and that the limit 
is positive. When Hypothesis 14.81 fails, the convergence in (|2"0~|) can be super-exponential 
(i.e., the limsup in ((20)) can be 0). For instance, this is the case if for some rank-1 stochastic 
matrix A with all entries positive there is positive probability that Si = A. 

5. Convergence to stationarity of EFCP chains 

Assume throughout this section that {X m \ m >\ is an EFCP on [k]^ or [k] K with directing 
measure /.££, as defined by ([TO]) . Let Si, S2, . . . be the associated paintbox sequence: these 
are i.i.d. random column-stochastic matrices with distribution E. Proposition 13.31 shows 
that the joint distribution of the coordinate variables X l m of an EFCP chain with paintbox 
sequence {S,},>i is controlled by the random matrix products Q,„ - S m S m -\ ■•■Si. In this 
section we use this fact together with the results concerning random matrix products re- 
counted in section H] to determine the mixing rates of the restrictions {Xj^ } m >l of EFCP 
chains to the finite configuration spaces [k]^'\ 
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5.1. Ergodicity. An EFCP chain need not be ergodic: for instance, if each S, is the identity 
matrix then every state is absorbing and X' m = X Q for every m > 1 and every i € N. 
More generally if the random matrices Sj are all permutation matrices then the unlabeled 
partitions of N induced by the labeled partitions X m do not change with m, and so the 
restrictions X,,, cannot be ergodic. The failure of ergodicity in these examples stems from 
the fact that the matrix products Q m do not contract the simplex Afc. 

Proposition 5.1. Let A be any stationary distribution for the induced Markov chain on the simplex. 
Then for each n e N U (oo) tfe X-mixture g" of the product multinomial measures on [k]^ is 
stationary for the EFCP chain on [kf n \ 

Note 5.2. Recall that the product-multinomial measures g" are defined by equation (O; the 
A-mixture is defined to be the average 

Q\= f Q»A{ds). 
J A* 

Thus, a random configuration X e [k]^ with distribution q\ can be obtained by first choos- 
ing s ~ A, then, conditional on s, independently assigning colors to the coordinates i e [n] 
by sampling from the g s distribution. 

Proof. This is an immediate consequence of Proposition |3.3i □ 

Proposition 5.3. Assume that with probability one the random matrix products Q m asymptotically 
collapse the simplex Aj., that is, 

(21) lim diameter(Q m (A fc )) = 0. 

m— >oo 

Then for each n e N the corresponding ECFP chain {Xj,'^} m >o on [k]^ is ergodic, i.e., has a unique 
stationary distribution. Conversely, if for some n > 1 the EFCP chain {xJ^} m >o is ergodic then the 
asymptotic collapse property ((21]) must hold. 

Proof. Fix n > 1. By Propositions 14 .41 and 15 . 1 1 there exists at least one stationary distribution 
7i. Let {X m } m >o and {X m } m >o be conditionally independent versions of the EFCP given the 
(same) paintbox sequence (S,) ; >i, with Xo ~ n and Xo ~ v arbitrary. Then for any time m > 1 
the conditional distributions of X m and X m given the paintbox sequence can be recovered 
from the formula ((12)) by integrating out over the distributions of Xo and Xo, respectively. 
But under the hypothesis ((21) , for large m the columns of Q m are, with high probability, 
nearly identical, and so for large m the products 

n n 

lQm(4/Xo) and ]Q«(4^o) 

i=l i=l 

will be very nearly the same. It follows, by integrating over all paintbox sequences, that 
the unconditional distributions of X m and X m will be nearly the same when m is large. This 
proves that the stationary distribution n is unique and that as m — > oo the distribution of 
X m converges to n. 

By proposition l4.4[ if the asymptotic collapse property ((21) fails then the induced Markov 
chain on the simplex has at least two distinct stationary distributions p., v. By Proposi- 
tion l5.1[ these correspond to different stationary distributions for the EFCP. 

□ 
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5.2. Mixing rate and cutoff for EFCP chains. We measure distance to stationarity using 
the total variation metric Write !D(X m ) to denote the distribution of X m . In general, 
the distance \\D(X m ) - n\\jy will depend on the distribution of the initial state Xq. The 
e— mixing time is defined to be the number of steps needed to bring the total variation 
distance between D(X m ) and n below £ for all initial states xq\ 

(22) t mx (e) = 4J X (£) = min{m > 1 : max \\D(X m ) - n\\ TV < e}. 

x o 

Theorem 5.4. Assume that with probability one the random matrix products Q m - S m S m -i ■ ■ ■ S\ 
asymptotically collapse the simplex A^, that is, relation ((14"]) holds. Then for a suitable constant 
K - Kz < oo depending only on the distribution E of Si, the mixing times of the corresponding 
EFCP chains on the finite state spaces [k]^ satisfy 

(23) fJto^Klogn. 

Remark 5.5. In some cases the mixing times will be of smaller order of magnitude than 
log n. Suppose, for instance, that for some m > 1 the event that the matrix Q m is of rank 
1 has positive probability. (This would be the case, for instance, if the columns of Si were 
independently chosen from a probability distribution on Ajt with an atom.) Let T be the 
least m for which this is the case; then T < oo almost surely, since matrix rank is sub- 
multiplicative, and Q m (Afc) is a singleton for any m > T. Consequently, for any elements 
a,b,c e [k], 

Qm(a,b) = Q m {a,c) if T < m. 

Hence, if {X m } m >o and {X m } m >o are versions of the EFCP with different initial conditions Xo 
and Xo, but with the same paintbox sequence S m , then by Proposition l3.3[ X m and X m have 
the same conditional distribution, given a(S,),>i, on the event T < m. It follows that the 
total variation distance between the unconditional distributions of X m and X m is no greater 
than P{T > m). Thus, for any n € IN, the EFCP mixes in 0(1) steps, that is, for any £ > 
there exists K E < oo such that for all n, 

Proof of Theorem 13741 (A) Consider first the special case where for some 5 > every entry of 
Si is at least 6, with probability one. It then follows that no entry of Q m is smaller than 5. 
By Proposition l4.11 if (14")) holds then the diameters of the sets Q m (Afc) shrink exponentially 
fast: in particular, for some (nonrandom) g < 1, 

(24) diameter(Q m (A )c )) < g m 

eventually, with probability 1. 

Let {X m } m >o and [X m },„>o be versions of the EFCP on [k]' n ' with different initial conditions 
Xo and Xo, but with the same paintbox sequence S m . By Proposition 13.31 the conditional 
distributions of X m and X m given the paintbox sequence are product-multinomials: 

n 

(25) P(X;„ = x { for each i e [n] = Q m {x' m , X ) and 

i=i 

n 

P{X\ n = j for each i e [n] \ S) = f[ Qmtin, X Q ). 

i=l 

Since the multinomial distributions Q m {-, •) assign probability at least 5 > to every color 
j € [k], Corollary I2.3l imp lies that for any e > 0, if m = Klogn, where K > -1/(2 log g), then 
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for all sufficiently large n the total variation distance between the conditional distributions 
of X m and X m will differ by e on the event ((24)) holds. Since ((24)) holds eventually with 
probability one, the inequality (|23)) now follows by Lemma l2~4l 

(B) The general case requires a bit more care, because if the entries of the matrices Qm are 
not bounded below then the product-multinomial distributions ((25)) will not be bounded 
away from d A^, as required by Corollary 12.31 

Assume first that for some m > 1 there is positive probability that Q m (Aj-) is contained in 
the interior of A^. Then for some 5 > there is probability at least 5 that every entry of Q m 
is at least 5. Consequently, for any a > and any K > 0, with probability converging to one 
as n — > oo, there will exist m e [K log n, K(l + a) log n] (possibly random) such that every 
entry of Q m is at least 5. By ((24)) the probability that the diameter of Q m (Ajt) is less than 
g m converges to 1 as m — > co. It then follows from Corollary 12.31 by the same argument 
as in (A), that if K > -1/(2 log g) then the total variation distance between the conditional 
distributions of X m and X m will differ by a vanishingly small amount. Since total varia- 
tion distance decreases with time, it follows that the total variation distance between the 
conditional distributions of Xk+ku and Xk+ko are also vanishingly small. Consequently, 
the distance between the unconditional distributions is also small, and so (|23)) follows, by 
Lemma 12.41 

(C) Finally, consider the case where Q m (Afc) intersects d A^ for every m, with probability 
one. Recall (Proposition 15.1)1 that if the asymptotic collapse property ((14)) holds then the 
induced Markov chain Y m on the simplex has a unique stationary distribution v. If there is 
no m € N such that Q m (Afc) is contained in the interior of A/t, then the support of v must be 
contained in the boundary d Afc. Fix a support point V, and let m be sufficiently large that 
((24)) holds. Since Q m (Ajt) must intersect d Ajt, it follows that for any coordinate a e [k] such 
that v a - (note that there must be at least one such a, because v e d A^), the ath coordinate 
(Q m y)a of any point in the image Q,„(Ajt) must be smaller than g'". If K is chosen sufficiently 
large and m > Xlogn, then g m < n~ 2 ; hence, by Proposition |33) 

P(X l m - a for some i e [n] \ a(S/)/>i) < n ■ n~ 2 = n~ l -* 0, 

and similarly for X m . Therefore, the contribution to the total variation distance between the 
conditional distributions of X m and X m from states x 1 x 2 ■ ■ ■ x" in which the color a appears 
at least once is vanishingly small. But for those states for which no such color appears, the 
factors Q ;f ;(fl, b) in (|25)) will be bounded below by the minimum nonzero entry of v, and the 
result will follow by a routine modification of the argument in (B) above. □ 

Parts (A)-(B) of the foregoing proof provide an explicit bound in the special case where 
Q m (Ajt) is contained in the interior of A;t with positive probability. 

Corollary 5.6. Assume that with probability one the random matrix products Q m = S m S m -\ ■■■S\ 
asymptotically collapse the simplex A^, so that for some < g < 1, 

diameter(Q m (A,t)) < g m 

for all sufficiently large m, with probability 1. Assume also that with positive probability Q m (A^) 
is contained in the interior of A^, for some m > 1. Then for any K > -1/(2 log g) the bound ((23]) 
holds for all sufficiently large n. 
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Theorem 5.7. Assume that the paintbox distribution E satisfies hypothesis 14.81 T/ien the cor- 
responding EFCP chains exhibit the cutoff phenomenon, that is, for all e,5 € (0,1/2), if n is 
sufficiently large, then 

(26) (9 - <5) log n < C(l - 6') < C(£) <(0 + 5) log n, 
where 

(27) = -1/(2 log Ai) 

and Ai z's f/xe second Lyapunov exponent of the sequence Q m , that is, as in proposition (|4.11|l . 

Proof of the Upper Bound t Mlx (e) < (9 + <5)logn. Because the distribution of Si is absolutely 
continuous with respect to Lebesgue measure, there is positive probability that all entries 
of Si = Qi are positive, and so there is positive probability that Qi(Ajt) is contained in the 
interior of A^. Therefore, Corollary 15.61 applies. But Proposition 14.111 and Corollary 14.141 
implies that, under Hypothesis l4.8l that q = A\. 

□ 

Proof of the Lower Bound t MiX (e) > (9 - 5) log n. It suffices to show that there exist initial states 
xq, xq such that if {X t } t >o and {X f } f >o are versions of the EFCP with initial states Xo = x$ and 
Xo = Xo, respectively, then the distributions of X m and X m have total variation distance 
near 1 when m < (9 - 5)logn. The proof will rely on Corollary 14.141 according to which 
there is a possibly random pair of indices i + j for which 

(28) lim \\Q m e i -Q m ej\\ 1/m = A 1 . 

Consider first, to fix ideas, the special case k — 2. In this case ((28|) holds with i = 1 and 
/' = 2. Assume that n = In' is even (if n is odd, project onto the first n — 1 coordinates), and 
let 

x = 11 • • - 111 • • • 1 and x Q = 111 • • • 122 • • -2 

be the elements of [k] n such that Xq has all coordinates colored 1, while x$ has its first n' col- 
ored 1 but its second n' colored 2. We will show that the distributions of X m and X m remain 
at large total variation distance at time m = (9 - a) log n. Without loss of generality, as- 
sume that both of the chains {X t } t >o and {X t } t >o have the same paintbox sequence Si, S2, 

Then by Proposition [331 the conditional distributions of X m and X m given S = o(St)t>i are 
product-multinomials; in particular, for any state x 1 £ [k]^ n \ 

n 

P{X l m = x 1 for all le[n]\ o(S t ) m ) = [] Q m (x', 1) and 

1=1 

n' In' 

P(X l m = 5c l for all le[n]\ a(S t ) f >i) = [] Q m (x l , 1) ] [ Q m {x l ,2). 

1=1 l=n'+l 

But relation (|28)l implies that, for some a = a(b) > 0, if m = (9-6) log n then the ^"-distance 
between the z'th and yth columns of Q m is at least n~ 1 ^ 2+a , with probability approaching 1 
as n — > co. Consequently, the first n' and second n' coordinates of X m are (conditional on 
S) independent samples from Bernoulli distributions whose parameters differ by at least 
n _1 ' 2+a , but the 2n' coordinates of X m are (conditional on S) a single sample from the same 
Bernoulli distribution. It follows, by Lemma [2.51 (see Remark 12.61 statement (B)), that the 
unconditional distributions of X m and X m are at large total variation distance, because in X m 
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the first and second blocks of n' coordinates are distinguishable whereas in X m they are 
not. Thus, if m = (6 — 5) log n then as n — > oo, 



The general case is proved by a similar argument. Let n - 2k(k - \)n' be an integer 
multiple of 2k(k — 1). Break the coordinate set [n] into k(k — 1) non-overlapping blocks of 
size 2n' , one for each ordered pair of distinct colors. In the block indexed by (i,f) 
let Xq take the value i, and let xq take the value i in the first half of the block and the 
value j in the second half. Let {X t } t >o and {X t }f>o be versions of the EFCP with initial 
states Xq and Xq, respectively. Then by an argument similar to that used in the binary 
case k = 2, if m = (6 - 5) log n then for large n, in some block (i, j) of X m the first n' and 
second n' coordinates of X m will be distinguishable, but in X m they will not. Therefore, the 
unconditional distributions of X m and X m will be at total variation distance near 1. 



Example 5.8 (Self-similar cut-and-paste chains). Self-similar cut-and-paste chains were intro- 
duced in Q. These are EFCP chains for which the paintbox measure E = E v is such that if 
Si ~ E then the columns of Si are i.i.d. with common distribution v, for some probability 
distribution v on Ajt- If Si, S2, ■ ■ ■ are i.i.d. with distribution E v then the random matrix 
products Q m = S m S m -\ ■■■ S\ asymptotically collapse the simplex provided the measure v is 
nontrivial (i.e., not a point mass), and so Theorem l5.4l applies. If in addition the measure v 
has a density of class U relative to Lebesgue measure on Ajt, then Theorem 15 . 71 applies . 

5.3. Examples. We now discuss some examples of Markov chains on X[ n ]:fc whose transi- 
tions are governed by an i.i.d. sequence of random partition matrices M\,Mi, . . . with law 
p, but are not EFCP chains because p does not coincide with pz for some probability mea- 
sure E on Ajj. As a result, the examples we show are not covered by theorems 15.41 or IBTTl 
None of the examples are EFCP chains. We are, however, able to establish upper bounds 
and, in some cases, cutoff using different techniques. All of the chains in these examples 
are reversible and ergodic relative to the uniform distribution on [ky- n '. 

Example 5.9 (Ehrenfest chain on the hypercube). For k = 2, any L € X.[ n ]± can be regarded as 
an element in [2]", or equivalently {0, 1}". For each i= 1, . . . , n and a e {1, 2}, we define M a ,\ 
as the 2x2 partition matrix with entries 



Let *o £ -C[ n ]:k be an initial state and first choose a\, ai, . . . i.i.d. Bernoulli(l/2) and, indepen- 
dently of (a m ), choose Zi,Z2, . . . i.i.d. from the uniform distribution on [n]. Then the chain 
X = (X m ,m > 0) is constructed by Xo = x$ and, for m = 1,2,..., X m = M fln;+ i /!m X,„_i. This 
corresponds to the usual Ehrenfest chain on the hypercube, which is known to exhibit the 
cutoff phenomenon at (l/2)n log n; for example, see 0, example 18.2.2. 

Example 5.10 (General Ehrenfest chain). A more general form of the Ehrenfest chain in the 
previous example is described as follows. Fix n € N, take a £ (0, 1) and choose a random 
subset A c [n] uniformly among all subsets of [n] with cardinality [an] := maxjr € IN : r < 
an}, the floor of an. For i e [2] and A c [n], we define the partition matrix M(A, i) by either 



||D(X m ) - D(X m )\\ TV 



1. 



□ 




or 





or 




T := min 
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Let A = {A\,Ai, . . .) be an i.i.d. sequence of uniform subsets of size \_an\, let I - {l\,lz, ■ ■ ■) 
be i.i.d. from the uniform distribution on {1,2} and let Xo € [2]". Conditional on A and L 
we construct X = (X m , m > 0) by putting Xq = xq and, for each m > 1, 

X m := M(A m , I m ) ■ ■ ■ M{A\, h)X . 

We call X an Ehrenfest(a) chain. 
Define the coupling time T by 

t 

t>l:\jAj = [n] 

pi 

Any two chains X and X' constructed from the same sequence A will be coupled by time 

r. 

An upper bound on the distance to stationarity of the general Ehrenfest(a) chain is 
obtained by standard properties of the hypergeometric distribution. In particular, let 
R t := #([n]\ Uy =1 Aj) be the number of indices that have not appeared in one of A\, ... ,A t . 
By definition, {T < t} = {Rt = 0} and standard calculations give 

p<R '« = ' K=r)= (r-y)("r)Uj) '' i=oA r - 

For fixed a 6 (0, 1), the e' -mixing time is bounded above by 

(29) ||D(X f )-TC|| T v<n(l-^) <nexp{-lan\t/n} 

and it immediately follows, for j6 > and t = (jn^r l°g n + fi\jtr\ \ tnat 

||£>(X t ) - Tr|| T y < n _1/2 exp(-jS) -> as p -> oo. 

When a e (0,1/2], we can use proposition 7.8 from (9[ and some standard theory for 
coupon collecting to obtain the lower bound 

||0(X t )-7i||TV>l-8exp{-2^ + l}, 

when t = ( 2[^j log n ~ § T^j)- Hence, these chains exhibit cutoff at n/(2[anj) log n. 
Note that the standard Ehrenfest chain (example I5.9D corresponds to a = 1/n. 

Example 5.11 (A log log n upper bound on mixing time). For the general Ehrenfest chains 
described above, the upper bound (|29)) on mixing time can be applied more generally to 
sequences a := (a\, ar, . . .) in (0, 1). For each n e 1ST, let a n = 1 - exp{- log n/ log log n} and 
let X" be an Ehrenfest(a„) chain. By ((29]) , for t > (1 + jS) log log n, f} > 0, we have 

||£>(X[ ! ) - tiIItv < fi-P, 

which converges to as n — > co. 

In general, we can obtain an upper bound of (1 +fi)f(ri), where f(ri) is a function of n € IN, 
by the relation 

logn 
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The space [k] n is a group under addition modulo k defined by 

x + x' = x + x' -2 (mod k) + 1. 

Write N£ to denote the group [k] n together with the operation +, which we define by com- 
ponentwise addition modulo k of the coordinates of x € [k] n . That is, for any x,x' e [k] n , 
we have 

(x + x'Y = x { + x d - 2 (mod k) + 1. 
This action makes the space £[ n ]-.k into a group with a corresponding action • that can also 
be represented by left action of a partition matrix as follows. If we regard L, U £ X[n]:k 
as elements of the group (NJ. ! , +), then we define the group action L • U = L + U in the 
obvious way. Alternatively, for each L e £[ n ]±, define Ml £ M[ n ]-.k as the kx k matrix whose 
;th column is the y'th cyclic shift of the classes of L; that is, 



(30) 



M L := 



f L1 


Lk 


Lk-i ■ ■ ■ 


Li] 




U 


L k ■■■ 




L 3 


L 2 


u ■■■ 


U 



Then, for every L, U £ £[oo]:k> we have 

LmV :=M L L'. 

Example 5.12. For n e N, let g„ be a probability measure on X[„] : / c and let Lq e £.[„]±. A 
CPn(0n) chain X with initial state Xo = Lo can be constructed as follows. First, generate 
L\, L2, ■ ■ ■ i.i.d. from g n . Conditional on L\, hi,..., put X m - L m • • • • L\ • Xo. Under the 
definition (|30)) this is a cut-and-paste chain; however, the columns of each matrix are a de- 
terministic function of one another and are not conditionally independent (as in previous 
examples). 

Consider the case where g„ is a product measure of a probability measure A on [k] which 
is symmetric, i.e. 

A0) = A(fc-; + l)>0 / j = l,...,k. 

In this case, it is easy to see that the CF n (g„) chain is reversible and hence has the uniform 
distribution as its unique stationary distribution. 

For this construction of X, the directing measure y. on M\ n ]± induced by A is neither 
row-column exchangeable (RCE) nor can it be represented as /,(£ for some measure E on 
S\. Nonetheless, the mixing time of X is bounded above by Xlogn for some constant 
K < 2/miny A(;') < oo. 

6. Projected cut-and-paste chains 

Recall that there is a natural projection Yl n : £,[ n ]± — » P[ n ]-.k from the set £.\ n ]± of labeled 
partitions of [k] to the set P[ n ]± of unlabeled partitions. If {X m } m >o is a Markov chain on the 
set [k] n = -£[n] : fc whose transition probability matrix is invariant under permutations of the 
labels [k], then the projection {Tl n (X m )} m >o is also a Markov chain. Assume henceforth that 
this is the case. 

Following is a simple sufficient condition for the law of an EFCP chain to be invariant 
under permutations of the label set [k]. Say that a probability measure E on the space Ajj; 
of column-stochastic matrices is row-column exchangeable if the distribution of Si ~ E is 
invariant under permutations of the rows or the columns. 
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Lemma 6.1. If {X m } m >o is an EFCP chain on the set [k] n = JL[ n \± whose paintbox measure E z's 
row-column exchangeable then its transition probability matrix is invariant under permutations of 
the labels [k]. 

Proof. Let y be a permutation of [k] and define T e M[ n ]-.k as the partition matrix with 
entries 

r .. _ f [n], 7(0 = ; 
^ 1 0, otherwise. 

For L, L' € X[n]:fc/ let -P(£/ denote the transition probability from L to L' under the opera- 
tion $8fy with directing measure By row-column exchangeability of E, we have, for all 
permutations 7, y' of [k], 

P(L,L') = P(L, TL') = P(TL,L') = P(TL,r'L') 

for every L, U € -£r B ] : jt. It follows immediately that the transition probability Q = PTl' 1 of 
the projected chain n„(X) is given by 

Q(n„(L), n„(L')) = ^ #n "( L ')p(L, L'), for every L, V e £ [n]:k . 

□ 

Following Crane O, we call the induced chain n := Yl (XI (X) of an EFCP chain with RCE 
directing measure E a homogeneous cut-and-paste chain. 

If the chain {X m } m >o is ergodic, then its unique stationary distribution is invariant under 
permutations of [k], since its transition probability matrix is, and therefore projects via Yl n 
to a stationary distribution for the projected chain {n„(X m )} m >o. The sufficiency principle 
(equation (O) for total variation distance (see also Lemma 7.9 of |9j) implies that the rate 
of convergence of the projected chain {n„(X m )},„>o is bounded by that of the original chain 
{X m } m >o. Theorem l5.4l provides a bound for this convergence when the chain {X m } m >o is an 
EFCP. 

Corollary 6.2. Assume that {X m = Xj™} m >o is an EFCP chain on [k]^ whose paintbox measure 
E zs RCE and satisfies the hypothesis of Theorem 15.41 (in particular, the random matrix products 
Q m asymptotically collapse the simplex A^). Then for a suitable constant K - Kj: < 00 depending 
only on the distribution E of Si, and for any e > 0, the mixing times t^ [X (e) of the projected chain 
{n„(X m )} m > satisfy 

Klogn 

for all sufficiently large n. 

Theorem 6.3. Suppose E zs a row-column exchangeable probability measure on S k . Let X be a 
CP,i(/.ze) chain and let Y - n„(X) be its projection into P[ n ]±. Let tx(e) and ty(e) denote the 
e-mixing times ofX and Y respectively. Then 

fx(e) = Me)- 

In particular, ifl(e,n) < tx(t:) < L(e,n) are upper and lower bounds on the e-mixing times ofX, 
then 

l(e,n) < Me) < L(e,n), 

and vice versa. Moreover, X exhibits the cutoff phenomenon if and only if Y exhibits the cutoff 
phenomenon. 
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Proof. If 7i is the stationary distribution for X, then nll~ is the stationary distribution of 
Y. The rest follows by the proceeding discussion regarding sufficiency of Tl„(X) and the 
sufficiency principle (Q. □ 

Corollary 6.4. Assume that the paintbox measure Z is row-column exchangeable and satisfies 
hypothesis \4.8\ and let {X m } m >o be the EFCP on [ky"' with associated paintbox measure E. Then 
the projected CP„(jUe) chain n„(X) exhibits the cutoff phenomenon at time Ologn, where 6 - 
-1/(2 log Ai). 
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